San Diego State University
San Diego State University
Abstract: Racial residential segregation is a longstanding topic of focus across the disciplines of urban social science. Classically, segregation indices are calculated based on areal groupings (e.g. counties or census tracts), with more recent research exploring ways that spatial relationships can enter the equation. Spatial segregation measures embody the notion that proximity to one’s neighbors is a better specification of residential segregation than simply who resides together inside the same arbitrarily-drawn polygon. Thus, they expand the notion of “who is nearby” to include those who are geographically close to each polygon rather than a binary inside/outside distinction. Yet spatial segregation measures often resort to crude measurements of proximity, such as the euclidean distance between observations, given the complexity and data requirements of calculating more theoretically-appropriate measures, such as distance along the pedestrian travel network. In this paper, we examine the ramifications of such decisions. For each metropolitan region in the U.S., we compute both Euclidean and network-based spatial segregation indices. We use a novel inferential framework to examine the statistical significance of the difference between the two measures and following, we use features of the network topology (e.g. connectivity, circuity, throughput) to explain this difference using a series of regression models. We show that there is often a large difference between segregation indices when measured by these two strategies (which is frequently significant). Further, we explain which topology measures reduce the observed gap and discuss implications for urban planning and design paradigms
An exceedingly common abstraction in applied spatial analysis is the use of euclidean distance as a proxy measure for geographic proximity (which is, itself, often a proxy for the frequency of social interaction). It’s the geographic equivalent of the spherical cow, save that scientists of many different disciplines often fail to realize how simplified it is. While, in general, simple proximity is a reasonable heuristic for understanding Tobler’s Law (Tobler, 1970), the behavioral realities of movement and social interaction in complex urban environments often require a more thoughtful model.
this project examines the relationship between pedestrian network characteristics and the measurement of metropolitan segregation. It examines three questions:
really quick overview of normative concepts of community and urban design… Daniel Burnham, Le Corbusier, Ebenezeer Howard, James Rouse, and… Emily Talen
how do we represent space in social science research?
classics: - sociology uses groups. Neighborhoods or cities are discrete containers that condition social behaviors (Park, Burgess, McKenzie) - econ and regional science use distance from the city center - ultimately about transport of goods (von thunen was based on an agricultural economy and moving crops from the ag hinterlands into the marketplace where people actually lived). - Transport connectivity is implicit, but models are high-level in the 1950s, and the abstraction works conceptually. Neither the theory or computational power exist yet to examine the role of better measurements of W_{ij}
recents: - GIS, geography, and spatial econometrics concepts of spatial weights - multiscale and/or bespoke neighborhoods in geography and sociology (HIPP & BOESSEN (2013), van Ham, ) - street networks in empirical work - grannis shows social interactions are more frequent inside T-communities defined by street networks - Roberto uses street networks to measure segregation in a small-scale case study.
Now we have both the tools and the logic to test these assumptions and understand the role of abstractions such as euclidean distance-based measures in our assessment of critical social processes such as residential segregation. Fast graph algorithms allow us to construct more realistic concepts of spatial weights matrices, and computational statistics allow us to construct and test realistic null hypotheses about the allocation of urban population groups. Here, we examine the role of street network topology in the appropriate measurement of urban segregation. Our goals are twofold. First, we aim to understand the implications of simple Euclidean distance- based abstractions when conducting formal spatial analyses; that is, do we find substantive differences in results when more realistic concepts of spatial relationships (e.g. network connectivity) are considered? Second, we aim to explore the elements of urban design (particularly the street network configuration) in widening the gap between analytical abstraction and empirical reality. More simply, we aim to understand whether certain elements of the street network are associated with a greater difference in measured segregation. With this knowledge, urban designers and planners can begin with more inclusive communities from the beginning.
Classically, space is treated as a discrete concept, by membership in a group (i.e. a school, classroom, neighborhood, or city), where any of these groupings is defined exogenously.
Reardon & O’Sullivan (2004) Reardon et al. (2009) Wong (1997) Bailey (2012) Rey & Folch (2011) O’Sullivan & Wong (2007) Wong (2004) Dawkins (2004)
Lee et al. (2008) Reardon et al. (2008) Bézenac et al. (2022) Olteanu et al. (2019) Östh et al. (2015) Clark et al. (2015)
A long-recognized but understudied element of metropolitan segregation patterns is the role of transport networks, physical barriers, and other factors such as elevation or congestion that change travel behavior, and thus, the expected potential for social interaction in space. For example work in sociology has shown the importance of street network connectivity in fostering connected social networks inside small urban geographic zones (Grannis, 1998).
Grannis (2005)
Figure 1: Network Distance vs Euclidean Distance in Urban Environments. a — Network Distance vs Euclidean Distance, b — Network Distance vs Euclidean Distance
A depiction of the difference between network travel distance and “as the crow flies” distance is shown in Figure 1. The figure shows an origin marked with an X in the center, and two different polygons representing a one-mile travel distance using different methods. The small polygon depicts the total extent accessible from the origin point when traveling along the pedestrian network, whereas the larger polygon depicts the 1-mile buffer representing unconstrained travel. It is immediately apparent in the figure that network-constrained travel covers a much smaller footprint than euclidean distance in the depicted location. Furthermore, the pattern appears to be influenced strongly by the street network and urban design features that characterize the largely suburban region of the San Diego metro area. Instead of a regular grid that facilitates travel in all directions, the street network in Figure 1 includes several insular patterns, cul-de-sacs, and 3-way intersections that help channel traffic in certain directions rather than others. Furthermore, the fact that some subdivisions have only a single entrance makes clear how much further a person would need to travel to reach the homes in certain regions (versus how much easier they appear to be reached via the circular buffer).
Recent work by Roberto (2018) shows the importance of considering network distances when measuring segregation using both simulated data and an empirical example in Pittsburgh, PA. That study shows that segregation is consistently higher at all spatial scales when the measure accounts for local network connectivity. As Roberto (2018, p. 28) notes, “even small positive differences in the city-level results are meaningful and suggest that physical barriers facilitate greater separation between ethnoracial groups and higher levels of segregation.” We agree with this assessment and in what follows, we examine the magnitude of differences between network and simple euclidean measures in detail for every metropolitan region in the United States. Specifically, we expand upon prior work in three different directions. First, we expand the geographic scope by considering every metropolitan region in the United States, rather than a case study of a single city. Second, we adopt a computational inference framework that allows us to assess whether the observed differences between the segregation measures are large enough that they could not happen by chance. Finally, we explore the relationship between differences in observed segregation and characteristics of the local travel network.
We begin our analysis by computing two sets of segregation indices, adopting the spatial information theory statistic \tilde{H} as our measure of segregation. As Reardon et al. (2008, p. 512) describe, “the index \tilde{H} is a measure of how much less diverse individuals’ local environments are, on average, than is the total population of region,” and reaches its maximum of 1 only when “each individual’s local environment is monoracial.” Here, our goal is to test how sensitive the statistic is to different concepts of the “local environment,” with one concept adopting the simplified assumption of euclidean-based distance measurements, and the other requiring that distance be measured along a pedestrian transport network.
Following Reardon & O’Sullivan (2004) we consider a spatial region R populated by M racial groups indexed by m, with \tau and \pi as population density and proportion, respectively. Here we diverge from the classical notation in the segregation literature and instead adopt conventions more common in spatial econometrics and geographic analysis. Doing so allows us to strengthen the connection between similar concepts in different disciplines as well as gain finer control over the definition of spatial relationships. Since many spatial segregation measures are implemented in GIS and spatial analysis software designed by geographers, clarifying this connection can help ease interdisciplinary adoption and conversation around spatial segregation measures.
Thus, we index locations within R as i and j, and we operationalize the concept of spatial relationships using a spatial weights matrix W_{ij}. By focusing on W_{ij}, we are forced “to specify [our] underlying assumptions about socio-spatial proximity,” following the call by Reardon & O’Sullivan (2004, p. 154) for analysis that “compares segregation levels based on different theoretical bases for defining spatial proximity.” Conceptually, the spatial weights matrix W_{ij} is connectivity graph that defines the spatial relationship between nodes i and j, and the values w_{ij} encode the intensity of the edge \bar{ij}. Thus, the spatial weights matrix is a useful and flexible representation of the local neighborhood environment because it provides a generic data structure for encoding spatial relationships, where any link function (\phi, following the notation of Reardon & O’Sullivan (2004)) can be used to specify the proximity between units. Formally,
W_{ij} = \phi(D_{ij})
\qquad{(1)}
Where \phi is a proximity weighting function and D is a matrix containing pairwise distances for i and j. Classically, W_{ij} is typically created via binary connectivity between adjacent units, but a wide variety of other continuous specifications are also used in practice (Getis, 2009; Halleck Vega & Elhorst, 2015; Rey & Anselin, 2010), such as the euclidean distance between observations, or various kernel or distance-decay functions. Critically, the distance-weighting function \phi is distinct from the concept of distance (D), itself, which could be measured in Euclidean/geodesic distance, minutes of congested travel time, meters traveled along the sidewalk, or some generalized measure of utility. Separating these two concepts allows us to consider alternative distance metrics distinctly from alternative decay functions. The local environment for a given feature y at location i can then be measured by its spatial lag, SL, defined as
SL_i = \sum_j w_{ij} y_j
\qquad{(2)}
In the spatial econometrics literature, it is common to exclude the diagonal elements from W_{ij} to differentiate between focal effects and spatial spillovers in regression models, but when the diagonal is filled, then SL_i becomes a consummate measure of the local environment at location i.
To compute the spatial multigroup information theory index \tilde{H}, we first calculate local spatially-weighted population proportions as
\tilde{\pi}_{im} = \frac{SL_{im}}{\sum^M_{m=1}{SL_{im}}}
\qquad{(3)}
The density at location i is
\tilde{\tau_i} = \frac{\sum^M_{m=1}{SL_{im}}}{\sum^M_{m=1}\sum^I_{i=1}{SL_{im}}}
\qquad{(4)}
The entropy of the local environment at each location \tilde{E}_i is
\tilde{E}_i = -\sum^M_{m=1}(\tilde{\pi}_{im})\log_M(\tilde{\pi}_{im})
\qquad{(5)}
where M indicates the number of groups in the population. Finally,
\tilde{H} = 1-\frac{1}{TE} \sum^I \tilde{\tau_i}\tilde{E}_i
\qquad{(6)}
where \tilde{H} is the spatial information theory index defined by Reardon & O’Sullivan (2004). We perform all calculations using the open-source Python package segregation (Cortes et al., 2020), distributed as part of the Python Spatial Analysis Library (PySAL) (Rey, Anselin, et al., 2021)
To understand the implications of different parameterizations of space, we use data blockgroup-level from the US Census American Community Survey (ACS) 5-year sample (2013-2017) with four mutually-exclusive racial groups (non-Hispanic white, non-Hispanic Black, Hispanic, and Asian). Our sample contains data for 380 metropolitan Core Based Statistical Areas (CBSAs) in the United States. Blockgroups are the smallest geographic unit for which racial and ethnic data are available in the ACS. To compute euclidean-based spatial segregation measures, our distances are measured between blockgroup centroids; to compute network-based spatial segregation measures, we first attach the blockgroup centroids to the nearest intersection in the travel network, then compute the shortest network-based path between each pair of observations
Our data on street networks is collected from OpenStreetMap and the shortest network path is computed using the Python package pandana (Foti et al., 2012). To operate efficiently on metropolitan-scale street networks, the pandana package relies on a graph pre-processing technique known as contraction hierarchies that simplifies the computation by removing inconsequential nodes from consideration during the routing algorithm.
In each metropolitan region, we proceed by creating two different spatial weights matrices by varying the way distance is measured between observations. In both matrices, the proximity-weighting function \phi is a simple linear decay (triangular kernel) encoding a spatial weight that decreases with distance up to a threshold of two kilometers, outside of which observations no longer have an effect, (that is, r=2000):
\phi=
\begin{cases}
1- \left( \frac{d_{ij}}{r} \right),& \text{if } d_{ij}\leq r \\
0 & \text{otherwise}
\end{cases}
\qquad{(7)}
Between the two W matrices, however, we vary the input distance matrix D, between two concepts, euclidean distance and network distance (where network distance is defined as the shortest path along the pedestrian transportation network), W_{net}, and W_{euc}. In both matrices the diagonal is set to one, indicating that there is no spatial discount for the value located at observation i. Using these weights matrices W_{net} and W_{euc} to build local environments for each metropolitan region in Equation 1 propagates the two constructs through Equations 2-6, yielding two segregation measures \tilde{H}_{net}, \tilde{H}_{euc} and, implicitly, a difference between the two, \Delta_{\tilde{H}} = \tilde{H}_{net} - \tilde{H}_{euc}. The relative difference between segregation measures is the difference divided by the euclidean measure \Delta_{pct} = \frac{\Delta_{\tilde{H}}}{\tilde{H}_{euc}}
We assess the importance of considering network distance in segregation measurement by adopting the inferential framework outlined in Rey, Cortes, et al. (2021) and Cortes et al. (2020). The approach leverages a computational approach to statistical inference using random labelling to compare the observed difference between the two segregation measures (network versus euclidean) to a distribution of differences generated from the same data. More specifically, the measures \tilde{H}_{net}, \tilde{H}_{euc} and \Delta_{\tilde{H}} are computed and recorded for each metro region. As a result of this process, two “spatialized” versions of the metropolitan demographic composition are created, with one dataset representing euclidean distances and the other representing network-based distances.
We then create two synthetic datasets by pooling the input units from both original datasets and reassigning them at random. For each block-group, we randomly reassign the labels (net,euc) to the observed spatial lags from Equation 2. Once all units have been assigned to a group, the segregation measures are re-computed and their difference taken. This process is repeated 10,000 iterations. By comparing the observed difference between the two segregation measures against a distribution of differences generated via synthetic datasets, we can develop pseudo p-values based on a standard T-test. Our test in this case measures the empirical likelihood of obtaining the observed difference at random under the null hypothesis that the observed difference is within the standard range of differences1. The pseudo-p values represent probability of obtaining results in which the simulated difference was greater than the observed difference \Delta_{\tilde{H}}.
Although the correlation between planar and network based segregation measures is \rho=0.987, our results provide clear evidence that the choice of appropriate distance metric plays an important role in the computation of a spatial segregation index. In all but four cases, we show that segregation is higher when measured according to network distance than by pure euclidean distance2 (none of the four cases are significant different from a random pooling of the same data). Among the 380 CBAs in our dataset, 25.3% have a difference between euclidean and network-based segregation measures that is signficant at the \alpha=0.05 level, and 14.2% of the CBSAs are significant at the \alpha=0.01 level. Descriptive statistics of the differences between segregation measures in each metro are shown in Table ¿tbl:diff_descriptives?, and a list of the 54 CBSAs significant at the one percent level are listed in Table ¿tbl:one_pct_diffs?. Among these 54 CBAS, eight metros are located in California–twice the number of the next-most prevalent state (Texas)
The shape of distribution of differences is approximately normal. While the absolute difference between the two segregation measures in each CBSA can appear small, the relative difference is often reasonably large, with the network-based segregation measure approximately 20% higher than the euclidean-based measure on average. The largest relative difference gets as high as 69% (Carson City, NV).
The travel infrastructure in a metropolitan region serves as its skeleton for both urban development and social interactions. For decades, scholars have worked to quantify the aspects of urban form that help explain behaviors such as travel mode choice Ewing & Cervero (2010). A recent evolution of this work is the conception of a travel network as a formal graph structure (Boeing, 2018a, 2018b; Fleischmann, 2018; Fleischmann et al., 2021), and a set of software tools that facilitate its analysis (Boeing, 2016; Fleischmann, 2019).
We use OSMNx and Momepy to create measures of the pedestrian travel network collected from OpenStreetMap.
To understand how urban design decisions such as the topology of the travel network may impact the ability for residents to interact (as measured by the segregation index), we regress the difference in measured segregation on measures of the network graph structure.
Our two-value test is doing a good job (i.e., it is picking up a difference)
The ps_inter is an interaction term between the planar_measure and whether the two-value test was significant. This tells us that the slope is larger for those cities where the difference in the two-value test is significant.
The pct difference generally declines with the overall level of segregation and network size (as measured by street_length) althoug the latter association appears to be driven by the places with the significant two-value tests
There are two additional parameters worth exploring: the distance-decay function \phi, and the radius that defines the extent of the local environment r.
In the segregation literature, the importance of space has long been recognized, but a full grasp of its implications still eludes researchers. In this paper, we show that when considering the role of transportation infrastructure in segregation measurement, we obtain substantially different results than classic spatial approaches that adopt euclidean measurements.
In future work, this research could be extended in several directions
One promising direction is the consideration of alternative impedance measures when calculating shortest-path distances along the travel network. In the present study, we assume a constant rate of travel consistent with the average walking pace, and that impedance is reflected by graph distance alone. Alternative constructs could include elevation along with distance to get a more complete measure of the effort required to traverse by foot or bicycle. Similarly, the travel network could also be extended to include public transportation or (potentially congested) automobile travel. These considerations would require extensive additional data, which may limit the capacity for cross-sectional comparisons, but would also provide insight into alternative concepts of space and distance.
Another important avenue for further work is the blending of multiple graphs for a more complete understanding of multi-contextual segregation. For example children who live in a given neighborhood are simultaneously embedded in local neighborhood contexts, school catchment boundaries, and other local institutions such as religious and community organizations. Each of these contexts have partially-overlapping, occasionally nested, and often imperfectly-defined geographic boundaries, a full synthesis of which requires the development of new methods that integrate across these contexts (Galster, 2001; Galster, 2019). As one example, Wolf (2021) provides a technique for blending multiple graphs together, one spatial and one aspatial, and similar methods could be possibly used to integrate multiple contexts. Work along these lines would also help address the call by Reardon & O’Sullivan (2004, p. 156) for metrics that help understand bridges across social networks
Note this does not explicitly require the null \Delta_{\tilde{H}}=0. Instead the “null value” is the mean of the simulated parameter distribution.↩︎
For each CBSA in our sample, our euclidean distances are based on UTM coordinate systems, with each region’s data projected into its appropriate UTM zone.↩︎